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This paper offers a new approach for estimating and forecasting 
the volatiUty of financial time series. No assumption is made about 
the parametric form of the processes. On the contrary, we only sup- 
pose that the volatility can be approximated by a constant over some 
Lh I interval. In such a framework, the main problem consists of filtering 

^^ . this interval of time homogeneity; then the estimate of the volatility 

can be simply obtained by local averaging. We construct a locally 
adaptive volatility estimate (LAVE) which can perform this task and 
C^ ■ investigate it both from the theoretical point of view and through 

Monte Carlo simulations. Finally, the LAVE procedure is applied to 
a data set of nine exchange rates and a comparison with a standard 
GARCH model is also provided. Both models appear to be capable 
of explaining many of the features of the data; nevertheless, the new 
approach seems to be superior to the GARCH method as far as the 
out-of-sample results are concerned. 
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^5 . 1. Introduction. The aim of this paper is to offer a new perspective for 

the estimation and forecasting of the volatility of financial asset returns such 



o 



(^ • as stocks and exchange rate returns. 






A remarkable amount of statistical research is devoted to financial time 
series, in particular, to the volatility of asset returns, where the term volatil- 
r^ . ity indicates a measure of dispersion, usually the variance or the standard 

.. \ deviation. The interest in this topic is motivated by the needs of the finan- 

^ ■ cial industry, which regards volatility as one of the main reference numbers 

for risk management and derivative pricing. 
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5—1 I Actually, asset returns time series display very peculiar stylized facts, 

which are connected with their second moments. Graphically, they look like 
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2 D. MERCURIO AND V. SPOKOINY 

white noise, where periods of high and low volatihty seem to alternate. Their 
density has fat tails if compared to that of a normal random variable, and 
they show significantly positive and highly persistent autocorrelation of the 
absolute returns, meaning that large (resp. small) absolute returns are likely 
to be followed by large (resp. small) absolute returns. Typical examples can 
be seen in Section 6, and further details on this topic can be found in Taylor 
(1986). Therefore, a white-noise process with time- varying variance is usually 
taken to model such features. Let St denote the observed asset process. Then 
the corresponding (log) returns Rf = log{St/St-i) follow the heteroscedastic 
model 

Rt = CTtCt, 

where ^t are standard Gaussian independent innovations and at is a time- 
varying volatility coefficient. It is often assumed that at is measurable w.r.t. 
the (7-field generated by the preceding returns Ri,. . . ,Rt-i- For modeling 
this volatility process, parametric assumptions are usually used. The main 
model classes are the ARCH and GARCH family [Engle (1995)] and the 
stochastic volatility models [Harvey, Ruiz and Shephard (1994)]. A large 
number of papers has followed the first publications on this topic, and 
the original models have been extended in order to provide better explana- 
tions. For example, models which take into account asymmetries in volatility 
have been proposed, such as EGARCH [Nelson (1991)], QGARCH [Sentana 
(1995)] and GJR [Glosten, Jagannathan and Runkle (1993)]; furthermore, 
the research on integrated processes has produced integrated [Engle and Bollerslev 
(1986)] and fractal integrated versions of the GARCH model. 

The availability of very large samples of financial data has made it possi- 
ble to construct models which display quite complicated parameterizations 
in order to explain all the observed stylized facts. Obviously, these mod- 
els rely on the assumption that the parametric structure of the process 
remains constant through the whole sample. This is a nontrivial and possi- 
bly dangerous assumption, in particular, as far as forecasting is concerned 
[Clements and Hendry (1998)]. Furthermore, checking for parameter insta- 
bility becomes quite difficult if the model is nonlinear and/or the number 
of parameters is large. Thus, those characteristics of the returns, which are 
often explained by the long memory and (fractal) integrated nature of the 
volatility process, could also depend on the parameters being time varying. 

In this paper we propose another approach focusing on a very simple 
model but with a possibility for model parameters to depend on time. This 
means that the model is regularly checked and adapted to the data. No 
assumption is made about the parametric structure of the volatility process. 
We only suppose that it can be locally approximated by a constant; that 
is, for every time point r there exists a past interval [r — m,T] where the 
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volatility at did not vary much. This interval is referred to as the interval of 
time homogeneity. An algorithm is proposed for data-driven estimation of 
the interval of time homogeneity, after which the estimate of the volatility 
can be simply obtained by averaging. 

Our approach is similar to varying-coefficient modeling from Fan and Zhang 
(1999); see also Cai, Fan and Li (2000) and Cai, Fan and Yao (2000). Fan, Jiang, Zhang and Zhou 
(2003) discussed applications of this method to stock price volatility mod- 
eling. The proposed procedure is based on the assumption that the model 
parameters smoothly vary with time and can be locally approximated by a 
linear function of time. This approach has the drawback of not allowing one 
to incorporate structural breaks into the model. 

Change point modeling with applications to financial time series was 
considered in Mikosch and Starica (2000). Kitagawa (1987) applied non- 
Gaussian random walk modeling with heavy tails as the prior for the piece- 
wise constant mean for one-step-ahead prediction of nonstationary time se- 
ries. However, the aforementioned approaches require some essential amount 
of prior information about the frequency of change points and their size. 

The LAVE approach proposed in this article does not assume smooth or 
piecewise constant structure of the underlying process and does not require 
any prior information. The procedure proposed below in Section 3 focuses 
on adaptive choice of the interval of homogeneity that allows one to proceed 
in a unified way with smoothly varying coefficient models and change point 
models. 

The proposed approach attempts to describe the local dynamic of the 
volatility process, and it is particularly appealing for short-term forecasting 
purposes which is an important building block, for example, in value-at-risk 
and portfolio hedging problems or backtesting [Hardle and Stahl (1999)]. 

The remainder of the paper is organized as follows. Section 2 introduces 
the adaptive modeling procedure. Then some theoretical properties are dis- 
cussed in the general situation and for a change point model. A simulation 
study illustrates the performance of the new methodology with respect to 
the change point model. The question of selecting the smoothing parameters 
is also addressed and some solutions are proposed. Finally, the procedure 
is applied to a set of nine exchange rates and it appears to be highly com- 
petitive with standard GARCH(1, 1), which is used as a benchmark model. 
Mathematical proofs are given in Section 8. 

2. Modeling volatility via power transformation. Let St be an observed 
asset process in discrete time, t = 1, 2, . . . , r and Rt are the corresponding 
returns: Rt = \og{St/ St-i)- We model this process via the conditional het- 
eroscedasticity assumption 

(2.1) Rt = cJtiu 
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where S,t, t > I, is a sequence of independent standard Gaussian random 
variables and at is the volatility process which is in general a predictable 
random process, that is, at ~ Tt-i with Tt-i = a{Ri, . . . , Rt-i) (the cr-field 
generated by the first t — 1 observations) . 

A time-homogeneous {tim,e-hom,oscedastic) model means that at is a con- 
stant. The process St is then a geometric Brownian motion observed at 
discrete time moments. The assumption of time homogeneity is too restric- 
tive in practical applications, and it does not allow one to fit real data very 
well. In this paper, we consider an approach based on the local tim,e ho- 
m,ogeneity, which means that for every time moment r there exists a time 
interval [r — m,T] where the volatility process at is nearly constant. Under 
such a modeling, the main intention is both to describe the interval of ho- 
mogeneity and to estimate the corresponding value o",- which can then be 
used for one-step forecasting and the like. 

2.1. Data transform,ation. The model equation (2.1) links the target 
volatility process at with the observations Rt via the multiplicative errors ^t ■ 
The classical well-developed regression approach relies on the assumption of 
additive errors which can then be smoothed out by some kind of averag- 
ing. A natural and widespread method of transforming equation (2.1) into a 
regression-like equation is to apply the log function to both its sides squared: 

(2.2) logi^? = loga2 + loget^ 

which can be rewritten in the form 

logR^t=loga^ + C + vCt, 

with C = Elog^l, v"^ = Var log^l and Ct = v~^{^og^t ~ C*); see, for example, 
Gourieroux (1997). This is a usual regression equation with the "response" 
Yt = log Rt, target regression function f{t) = logcr^ + C and homogeneous 
"noise" vQ- 

The main problem with this approach is due to the distribution of the 
errors Cti which is highly skewed and gives very high weights to the small 
values of the errors ^f In particular, this leads to a serious problem with 
missing data which are typically modeled equal to previous values providing 

Rt = 0- 

Another possibility is based on power transformation [see Carroll and Ruppert 
(1988)] which also leads to a regression with additive noise and this noise 
is much closer to a Gaussian one. Due to (2.1), the random variable Rt is 
conditionally on J-'t-i Gaussian and 

BiRl\J^t-i) = al 
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normal and powertransformed densities 




Fig. 1. Density o/pi/2(a;) {straight line) and the standard normal density {dotted line). 



Similarly, for every 7 > 0, 
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where ^ denotes a standard Gaussian r.v., C-^ = 'E'\^y and L*^ = Var |^|'^. 
Therefore, the process \Rt\'^ allows for the representation 



(2.3) 



\Rt 



C^aJ + D^a'Q, 



i^t ^t, 



where Q is equal to (ICT — C.y)/D^. Note that the problem of estimating at is 
in some sense equivalent to the problem of estimating 9t = C^a'J , which is 
the conditional mean of the transformed process |-Rtp. This is already a kind 
of heteroscedastic regression problem with additive errors D^a'JCt satisfying 
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A natural choice of the parameter 7 is 7 = 2 providing nearly efficient 
variance estimation under homogeneity. For 7 = 2 one has C^ = l and D"^ = 
2. Note, however, that the distribution of the "errors" Ct = (iCtT — C'^)/-^7 
is still heavy tailed and highly skewed, which results in a low sensitivity of 
the method in an inhomogeneous situation. The other important cases are 
7 = 1 and 7 = 1/2. A minimization of the skewness E(^^ and the fat 'EiC,t — 3 
with respect to 7 leads to the choice 7 ~ 1/2. The corresponding density 
^1/2(2;) of Ci/2 together with the standard normal density (?!>(x) is plotted in 
Figure 1. Our numerical results are also in favor of the choice 7 = 1/2; see 
Section 5. 



3. Adaptive estimation under local time homogeneity. Here we describe 
one approach to volatility modeling based on the assumption of local time 
homogeneity starting from the preliminary heuristic discussion. The assump- 
tion of local time homogeneity means that the function at is nearly constant 
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within an interval I = [t — m,T\^ and the process Rt fohows the regression- 
hke equation (2.3) with the constant trend 9i = C^a] which can be estimated 
by averaging over this interval /: 

(3.1) ^~/ = ^Ei^*r- 

For the particular case 7 = 2 the estimate 9j coincides with the local maxi- 
mum likelihood estimator (MLE) of the volatility a^ considered in Fan, Jiang, Zhang and Zhou 
(2003). As discussed in the previous section, a smaller value of 7 might be 
preferred for improving the stability of the method. Similarly to Fan, Jiang, Zhang and Zhou 
(2003), one can also incorporate the one-sided kernel weighting to this esti- 
mator. 
By (2.3) 

(3-2) ^~/ = SfE-7 + ^E-7G = ^E^* + 5rE^*c*, 

II tei II tei II tei ' ' tei 

with Sj = D^/Cy so that 

(3.3) Bei = E^Y.^t, 



'\tei 



2 / \ -^ 2 



(3-4) m^^U^'*^* =m^^^ 



91 



3.1. Some properties of the estimate Oj. Due to our assumption of local 
homogeneity, the process 9t is close to 9r for all t G /. Define also 

„2 



2 _ '^7 Y^ /)2 



A/ = sup |6lt -9r\ and vf = —f^2^ 9^ 

The value of A/ measures the departure from homogeneity within the in- 
terval /, and it can be regarded as an upper bound of the "bias" of the 
estimate 9j. The value of vj, because of (3.4), will be referred as the "condi- 
tional variance" of the estimate 9j . The next theorem provides a probability 
bound for the estimation error, that is, the deviation of 9i from the present 
value of the volatility 9r in terms of A/ and vi . 

Theorem 3.1. Let the volatility coefficient at satisfy the condition 

(3.5) b<a/ < bB, 

with some positive constants b,B. Then there exists a^ > such that, for 
every A > 1, 

F{\9i - 9r\ >Aj + \vi) < 4:^/^a-^X{l + \ogB)e-^^/^^''->\ 
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Remark 3.1. This result can be slightly refined for the special case 
when the volatility process at for t £ I is deterministic or (conditionally) 
independent of the observations Rt preceding /. Namely, in such a situation 
the factor 4-y/ea~^A(l + logB) in the bound can be replaced by 2: 

P(|^^ - 0^\ > A/ + Xvi) < 2e-^'/(2'^^). 
A similar remark applies to all the results that follow. 

The result of this theorem bounds the loss of the estimate 6j via the 
value A/ and the conditional standard deviation vj. Under homogeneity 
A/ = and the error of estimation is of order vj. Unfortunately, vj depends, 
in turn, on the target process 9t- One would be interested in another bound 
which does not involve the unknown function Of. Namely, using (3.4) and 
assuming A/ small, one may replace the conditional standard deviation vj 
by its estimate 

vi = s^ej\i\-^/^. 

Theorem 3.2. Let Ri, . ..,Rr obey (^2.1^ and let (^3.5j hold true. Then, 
for the estimate Oj of 6^ for every D >0 and A > 1, 

P(|^^ - 9^\ > X'vi, Ai/vi <D)< 4V^A(1 + logS)e-^'/(2a^), 

where X' solves 

X + D = X'/{l + X's^\I\-^^^). 

3.2. Adaptive choice of the interval of homogeneity. Given observations 
Ri,...,Rr following the time-inhomogeneous model (2.1), we aim to es- 
timate the current value of the parameter Or using the estimate 9j with a 
properly selected time interval I of the form [r — tti, r] to minimize the corre- 
sponding estimation error. Below we discuss one approach which goes back to 
the idea of pointwise adaptive estimation; see Lepski (1990), Lepski and Spokoiny 
(1997) and Spokoiny (1998). The idea of the method can be explained as 
follows. Suppose I is an interval candidate; that is, we expect time homo- 
geneity in / and, hence, in every subinterval of /. This particularly implies 
that the value A/ is small and similarly for all Aj, J C /, and that the mean 
values of the 6t over / and over J nearly coincide. Our adaptive procedure 
roughly means the choice of the largest possible interval / such that the 
hypothesis that the value 0t is a constant within / is not rejected. For test- 
ing this hypothesis, we consider the family of subintervals of / of the form 
J = [t — m' , t] with m' <m and for every such subinterval J compare two 
different estimates: one is based on the observations from J, and the other 
one is calculated from the complement I \ J = [t — m,T — m'[. Theorems 
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3.1 and 3.2 can be used to bound the difference 6j — 6nj under homogene- 
ity within /. Indeed, the conditional variance of 9j\j — 6j is vj^j + v'j and 
can be estimated by vj\^j + v'j. Thus, with high probabihty it holds that 



\0i\j-Oj\<X^vj^j + vj, 

provided that A is sufficiently large. Therefore, if there exists a testing inter- 
val J C I such that the quantity \Oj\j — Oj\ is significantly positive, then we 
reject the hypothesis of homogeneity for the interval /. Finally, our adaptive 
estimate corresponds to the largest interval / such that the hypothesis of 
homogeneity is not rejected for / itself and all smaller considered intervals. 

Now we present a formal description. Suppose a family I of interval can- 
didates / is fixed. Each of them is of the form I = [t — m,T], m € N, so 
that the set T is ordered due to m. With every such interval, we associate 
the estimate Oj of 9r and the corresponding estimate vj of the conditional 
standard deviations vj. 

Next, for every interval / from 2 we assume there is a set J^{I) of testing 
subintervals J [one example of these sets I and ^7(1) is given in Section 6]. 
For every J G J^{I) we construct the corresponding estimate 9j (resp. Oj\j) 
from the observations Yj = {Rtl"' for t £ J (resp. for t £ I \ J) according 
to (3.1) and compute vj (resp. vj\j). 

Now, with a constant A, define the adaptive choice of the interval of 
homogeneity by the following iterative procedure: 

Initialization. Select the smallest interval in I. 

Iteration. Select the next interval / in T and calculate the corresponding 
estimate Oj and the estimated conditional standard deviation vj. 

Testing homogeneity. Reject / if there exists one J G J^{I) such that 



(3.6) \e,^j-ej\>x^vj^j + vj. 

Loop. If / is not rejected, then continue with the iteration step by choos- 
ing a larger interval. Otherwise, set / = "the latest nonrejected /." 

The locally adaptive volatility estimate (LAVE) 9^ of 9.^- is defined by 
applying this selected interval I: 



The next section discusses the theoretical properties of the LAVE algorithm 
in a general framework, while Section 6 gives a concrete example for the 
choice of the sets T, J {I) and the parameter A. This choice is then applied 
to simulated and real data. 
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4. Theoretical properties. In this section we collect some results describ- 
ing the quality of the proposed adaptive procedure. 

4.1. Accuracy of the adaptive estimate. Let / be the interval selected 
by our adaptive procedure. We aim to show that our adaptive choice is up 
to some constant factor in the losses as good as the "ideal" choice I that 
may utilize the knowledge of the volatility process at- This "ideal" choice 
can be defined by balancing the accuracy of approximating the underlying 
process 9t (which is controlled by A/) and the stochastic error controlled by 
the stochastic standard deviation vj. By definition, vj = s^\I\~^ {J2t£i ^tY 
so that vj typically decreases when |/| increases. For simplicity of notation 
we shall suppose further that vj < vj for J C I. 

We do not give a formal definition of an "ideal" choice of the interval / 
since there is no one universally optimal choice even if the process 6t is 
known. Instead, we consider a family of all "good" intervals I such that 
the variability of the process 9t inside I is not too large compared to the 
conditional stochastic deviation vi. This, due to Theorem 3.1, allows us to 
bound with high probability the losses of the "ideal" estimate 9i by {D + X)vi 
provided that An/un < D and A is sufficiently large. A similar property should 
hold for all smaller intervals / C I. Hence, it is natural to quantify the quality 
of the interval I by 

Si= sup Aj/vi. 

7eX:/CI 

The next assertion claims that the risk of the adaptive estimate is not larger 
in order than vj for all I such that 6i is sufficiently small. 

Theorem 4.1. Let (3.5) hold true. Let an interval I be such that, for 
some D > 0, it holds with positive probability 5i < D. Then 

P{I is rejected, 6i< D) 

(^•^^ < E E 12V^Aj(l + logi?)e-(^'^-^)'/(2-.), 

7ex(i) Je J(/) 

where \j = A(l - s^XNJ^^"^) with Nj = min{| J|, |/ \ J\}. 

Moreover, if Nj > 2s^A for all J £ J{L) and all L £l, then it holds for the 
adaptive estimate 9 = 0j on the random set A = {I is not rejected, 6i < D} : 

\9i-9i\<2\vi 
and 

\9i - 9r\ < (L> + 3A + 2As^(L» + \)\l\-^/^)vi. 
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Remark 4.1. It is easy to see that the sum on the right-hand side of the 
bound (4.1) can be made arbitrarily small by proper choice of the constant 
A and the sets J [I). Hence, the result of the theorem claims that with 
a dominating probability a "good" interval I will not be rejected and the 
adaptive estimate 6 is up to a constant factor as good as any of the "good" 
estimates Of. 

Remark 4.2. As mentioned in Remark 3.1, the probability bound on 
the right-hand side of (4.1) can be refined for the special case when the pro- 
cess Ot is constant within I by replacing the factor 12-y/eAj(l + logi?)e~('*'-'~-^) /(2a7) 
by 6e--^j/(2a^). 

5. Change point model. A change point model is described by a sequence 
Ti < r2 < • • • of stopping times with respect to the filtration Tt and by 
values cri,cr2,..., where each a^ is .T^t-^. -measurable. By definition, at = cTk 
for Tk <t< Tfc_|_i and at is constant for t < Ti . This is an important special 
case of the model (2.1). For this special case the above procedure has a 
very natural interpretation: when estimating at the point r we search for a 
largest interval of the form [r — 7Ti,r] that does not contain a change point. 
This is done via testing for a change point within the candidate interval 
/ = [t — m^T]. Note that the classical maximum likelihood test for no change 
point in the regression case with Gaussian AA(0, cr^) errors is also based 
on comparison of the mean values of observations Yj over the subintervals 
/ = [t — m,T — m'] and every subinterval J =[t — m', t] for different m', so 
that the proposed procedure has strong appeal in this situation. However, 
there is an essential difference between testing for a change point and testing 
for homogeneity appearing as a building block of our adaptive procedure. 
Usually, a test for a change point is constructed in a way to provide the 
prescribed probability of a "false alarm," that is, rejecting the "no change 
point" hypothesis under homogeneity. Our adaptive procedure involves a 
lot of such tests for every candidate /, which leads to a multiple-testing 
problem. As a consequence, each particular test should be performed at a 
very high level; that is, it should be rather conservative providing a joint 
error probability at a reasonable level. 

5.1. Probability of a "false alarm.''^ For the change point model, a "false 
alarm" would mean that the candidate interval / is rejected although the 
hypothesis of homogeneity is still fulfilled. The arguments used in the proof 
of Theorem 4.1 lead to the following upper bound for the probability of a 
"false alarm": 
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Theorem 5.1. If I = [T — m,T] is an interval of homogeneity, that is, 
9f = 0^ for all t £ I , then 

P(/ is rejected) < Y^ T^ 6exp( ; — ; — t-tttttt 

/ex(n)jGj(/) ^ m^\i + As^\j\ ) 

This result is a special case of Theorem 4.1 with Aj = when taking into 
account Remark 4.2. 

Theorem 4.1 implies that for every fixed value M there exists a fixed A 
providing a prescribed upper bound a for the probability of a "false alarm" 
for a homogeneous interval I of length M. Namely, the choice 



M 

X>{l + e)J2a^log 

y mQU 

leads for a proper small positive constant e > to the inequality 



iei{i) J£j{i) 



'2a^(l + As^|J|-i/2)2 



Here, M/ruQ is approximately the number of intervals in J{I) (see Sec- 
tion 6.1). This bound is, however, very rough, and it is only of theoretical 
importance since we estimate the probability of the sum of dependent events 
by the sum of single probabilities. The value of A providing a prescribed 
probability of a "false alarm" can be found by Monte Carlo simulation for 
the homogeneous model with constant volatility as described in Section 6. 

5.2. Sensitivity to change points and the mean delay. The quality (sen- 
sitivity) of a change point procedure is usually measured by the mean delay 
between the occurrence of a change point and its detection. 

To study this property of the proposed method, we consider the case of 
estimation at a point r immediately after a change point Tcp. It is convenient 
to suppose that Tcp belongs to the end points of an interval which is tested for 
homogeneity. In this case the "ideal" choice I is clearly [rcp,r]. Theorem 4.1 
claims that the quality of estimation at r is essentially the same as if we 
knew the latest change point Tcp a priori. In fact, one can state a slightly 
stronger assertion: every interval / which is essentially larger than I will be 
rejected with high probability provided that the magnitude of the change is 
large enough. 

Denote m' = |I| , that is, m' = t — Tcp. Let also / = [Tcp — tti, r] = [t — m' — 
m,T] for some m, so that |/| = m + ni', and let 9 (resp. 9') denote the value 
of the parameter 9t before (resp. after) the change point Tcp. The magnitude 
of the change point is measured by the relative change b = 2\9' — 9\/9. 
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It is worth mentioning that the values 6t and especially 9'^ can be random 
and dependent on past observations. For instance, O'-f. may depend on Yt for 

ant<rcp. 

The interval / will certainly be rejected if \9j\i — Oi\ is sufficiently large 
compared to the corresponding critical value. 

Theorem 5.2. Let E{Yt\Tt-i) = before the change point at Tcp and 
^{Yt\Ft-i) = 0' after it, and let b=\e' - e\/e. Let L = [t - m' - m,T] with 
m' = T — Tcp. If p := As^/-\/min{7n, TTi'} < 1 and 

then P(/ is not rejected) < 4e~'^ K'^°-i) , 

The result of Theorem 5.2 delivers some additional information about 
the sensitivity of the proposed procedure to change points. One possible 
question is about the minimal delay m' between the change point T^p and 
the first moment r when the procedure starts to indicate this change point by 
selecting an interval of type I = [Tcp, r] . Due to Theorem 5.2, the change will 
be "detected" with high probability if the value p = Xsy/Vm/ fulfills (5.1). 
With fixed 6 > 0, condition (5.1) leads to p < bCo for some fixed constant Cq. 
The latter condition can be rewritten in the form m' > b~'^ X^ s'i / Cq . We see 
that this lower bound for the required delay m' is proportional to 6~^, where 
b is the change point magnitude. It is also proportional to the threshold A 
squared. In turn, for the prescribed probability a of rejecting a homogeneous 
interval of length M, the threshold A can be bounded by C^ylog{M/mQa) . 
In particular, if we fix the length M and a, then m' = 0{b~^). If we keep 
fixed the values b and M but aim to provide a very small probability of a 
"false alarm" by letting a go to 0, then m' = 0(loga~^). All these issues are 
in agreement with the theory of change point detection; see, for example, 
Pollak (1985) and Brodsky and Darkhovsky (1993). 

6. LAVE in practice. The aim of this section is to give some hints con- 
cerning the choice of the testing intervals and the smoothing parameter A 
and to illustrate the performance of the LAVE procedure on simulated and 
real data. We consider the simplest homogeneous model and we study the 
stability of the procedure in such a situation. Then a change point model 
is analyzed and the sensitivity with respect to the jump magnitude is mea- 
sured. Finally, LAVE is applied to a set of exchange rate data. 
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6.1. Choice of the sets X and J{I)- The presented algorithm involves 
the sets of interval candidates T and of testing intervals J {I). The simplest 
proposal is based on the use of a regular time grid ii,i2, • ■ • > with grid step 

tuq G N, that is, tk = rriQk, k = 1,2, For a given time point r, the set I 

of interval candidates is defined in the following way: 

^={h= [tk,T]:tk <T-mQ, k = l,2...}. 

Next, for every interval J^, we define the set J {Ik) of testing subintervals 
Jk' C Ik such that J^/ = [tfc',''"] for all tk' > tfc belonging to the grid. The 
homogeneity within /^ is then tested by comparing the pairs of estimates 
Oj and 9j^\j for all J E Jih)- 

In this construction the sets I, J{I) are completely determined by the 
grid step niQ. The value of rriQ should be selected possibly small, because it 
represents the minimal delay before the LAVE algorithm can detect a change 
point. Nevertheless, rriQ should be sufficiently large to provide stability of the 
estimates vj and vnj. For the simulation and the analysis of real data we 
use rriQ = 10, which represents a good compromise. However, small changes 
in this value, that is, 5 < rriQ < 20, do not appear to have great influence on 
the estimation results. 

6.2. Choice of A and 7. The selection of 7 and, in particular, A is more 
critical. Theorem 5.1 suggests that in the context of a change point model, 
a reasonable approach for selecting A is by providing a prescribed level a 
for rejecting a homogeneous interval / of a given length M. This would 
clearly imply at most the same level a for rejecting a homogeneous interval 
of a smaller length. However, the value of A which can be derived with the 
help of Theorem 5.1 is rather conservative. A more accurate choice can be 
made by Monte Carlo simulation. We examine the procedure described in 
Section 3.2 with the sets of intervals 2 and J{I) on the regular grid with 
the fixed step uiq = 10. A constant (and therefore also time homogeneous) 
model assumes that the parameter 9t does not vary in time, that is, 6t = 9. 
It can easily be seen that the value 9 has no influence on the procedure 
under time homogeneity. One can therefore suppose that 9 = 1 and the 
original model (2.1) is transformed into the regression model Yt = 1 + s^Q 
with constant trend and homogeneous variance s^. This model is completely 
described, and, therefore, one can determine by simulation the value of A 
for which an interval of time homogeneity of length M is not rejected with 
a frequency of 95%. 

The values of A are computed for M = 40 and 80 and for the power 
transformations 7 = 0.5, 1.0 and 2.0. The results are shown in Table 1. Note 
that the values of A calibrated for M = 80 are necessarily larger and therefore 
more conservative than the values of A calibrated for M = 40. 
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Table 1 

The value of A, which, for a given power transformation 7, provides the 

rejection of an interval of time homogeneity of length M with a frequency 

of 5% 



Smoothing parameter 


7 = 


0.5 




7 = 1.0 




7 = 2.0 




M = 80 
A = 2.74 


M = 
A = 


= 40 
= 2.40 


M = 80 M = 40 
A = 2.58 A = 2.24 


M = 
A = 


= 80 M = 
= 2.18 A = 


= 40 
= 1.86 



6.3. Simulation results for the change point model. We now evaluate the 
performance of the LAVE algorithm on simulated data. Two change point 
time series of length T = 240 are considered. The simulated data display 
two jumps of the same magnitude in opposite directions: at = cr for t £ 
[1, 80] and t G [161, 240] and at = a' for t £ [81, 160], where a = 1 and a' = 3 
and 5, respectively. For each model 500 realizations are generated, and the 
estimation is performed at each time point t E [to, 240], where to is set equal 
to 20. 

We compute the estimation error for each combination of 7 and A with 
the following criterion: 

240 500 / ~ s 2 

t=20w=l ^ * ^ 

where the index lo indicates the realizations of the change point model. We 
note that in (6.1) the quadratic error is divided by the true volatility so 
that the criterion does not depend on the scale of at- The results shown in 
Table 2 are favorable to the choice of the smaller value of 7, confirming that 
the loss of efficiency caused by 7 < 2 is offset by the greater normality of 
the errors. Figures 2 and 3 show the results of the estimation for the power 
transformation 7 = 0.5 and the value of A calibrated for an interval of time 
homogeneity of length M = 40 and M = 80, respectively. The plots on the 
top display the true process (straight line), the empirical median among all 
estimates (thick dotted line) and the empirical quartiles among all estimates 
(thin dotted lines). The plots on the bottom similarly display the length of 
the interval of time homogeneity, which is minimal (resp. maximal) just after 
(resp. just before) a change point, and the median and the quartiles among 
all estimates. 

The results are satisfactory. The volatility is estimated precisely and the 
change points are quickly detected. As expected, the behavior of the method 
within homogeneous regions is very stable. The delay in detecting a change 
point becomes smaller as the jump size grows. Taking a smaller A also results 
in a smaller delay and improves the quality of estimation after the change 
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Table 2 
Estimation errors for all the combinations of parameters 7 and X 



Estimation error 




7 = 


0.5 


7 = 1.0 


7 = 


2.0 


Parameter 


A = 2.74 


A = 2.40 


A = 2.58 A = 2.24 


A = 2.18 


A = 1.86 


Small jump 
Large jump 


19,241.9 
46,616.2 


17,175.3 
43,282.5 


19,121.2 16,522.5 
51,363.9 46,706.4 


24,887.2 
68,730.7 


17,490.9 
55,706.3 



points. The results for other power transformations look very similar and 
therefore are not displayed. 

6.4. Estimation of exchange rate volatility. We apply the LAVE proce- 
dure to a set of nine exchange rates, which are available from the web site 
http://federalreserve.gov of the U.S. Federal Reserve. The data sets rep- 
resent daily exchange rates of the U.S. dollar (USD) against the following 
currencies: Australian dollar (AUD), British pound (BPD), Canadian dollar 



trae and estimated volatility 



true and estimated volatility 
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1- - 


:::::::::/""::-l"'" 1 y'-C-tr-V^-r-ii'-^f 


- 
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true and estimated interval of homogeneity 




true and estimated interval of homogeneity 




Fig. 2. Estimation results for the change point model. The upper plots show the values of 
the standard deviation, while the lower plots show the values of the interval of homogeneity 
at each time point. True values {solid line), median of all estimates {thick dotted line), 
upper and lower quartiles {thin dotted lines). The value of A for 7 = 0.5 and M = 40 has 
been used. 
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true and estimated volatility true and estimated volatility 




40 80 120 160 200 240 




80 120 160 200 240 



true and estimated interval of homogeneity true and estimated interval of homogeneity 




40 80 120 160 200 240 




20 160 200 240 



Fig. 3. Estimation results for the jump model. The value of A for 7 = 0.5 and M = 80 
has been used. 

(CAD), Danish krone (DKR), Japanese yen (JPY), Norwegian krone (NKR), 
New Zealand dollar (NZD), Swiss franc (SFR) and Swedish krona (SKR). 
The period under consideration goes from January 1, 1990, to April 7, 2000. 
See Table 3. 

All the time series show qualitatively almost the same pattern; there- 
fore, we provide the graphical example only for the two representative ex- 
change rates JPY/USD and BPD/USD (Figure 4). The empirical mean of 
the returns is close to 0, while the empirical kurtosis is larger than 3. Fur- 



Table 3 
Summary statistics 



Currency 


n 


Mean X 10^ 


Variance X 10^ 


Skewness 


Kurtosis 


AUD 


2583 


-10.41 


3.191 


-0.187 


8.854 


BPD 


2583 


-0.679 


3.530 


-0.279 


5.792 


CAD 


2583 


8.819 


0.895 


0.042 


5.499 


DKR 


2583 


6.097 


4.201 


-0.037 


4.967 


JPY 


2583 


-12.70 


5.486 


-0.585 


7.366 


NKR 


2583 


9.493 


4.251 


0.313 


8.630 


NZD 


2583 


-6.581 


3.604 


-0.356 


49.17 


SFR 


2583 


1.480 


5.402 


-0.186 


4.526 


SKR 


2583 


12.66 


4.615 


0.372 


9.660 
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thermore, variance clustering and persistence of the autocorrelation of the 
square returns are also visible. The estimated standard deviation is nicely in 
accordance with the development of the volatility and, in particular, sharp 
changes in the volatility tend to be quickly recognized. Note also that the 
variability of the estimated interval of time homogeneity appears to grow 
as the estimated interval becomes larger. This is a feature of the algorithm 
because the number of tests grows with the accepted interval, so that a rejec- 
tion becomes more probable. Nevertheless, this variability does not strongly 
affect the estimated volatility coefficient. Figure 5 shows the significantly 
persistent autocorrelation of the absolute returns, together with the autocor- 
relation of the absolute returns divided by the estimated standard deviation. 
The autocorrelation of the standardized absolute returns is not significant 



JPY/USD returns 



BPD/USD returns 





estimated volatility 



estimated volatility 



^r/H 


\/ 


J\ 


i\ 




estimated interval of homogeneity 



estimated interval of homogeneity 





Fig. 4. Exchange rate returns, estimated standard deviation and estimated interval of 
time homogeneity. The value of X for 7 = 0.5 and M = 80 has been used. 



18 D. MERCURIO AND V. SPOKOINY 

JPY/USD absolut returns JPY/USD absolut returns standardized by LAVE 
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Fig. 5. ACF of the absolute values of the exchange rate returns and ACF of the absolute 
values of the exchange rate returns standardized by LAVE. 



any more, and this fact supports the choice of a locally homogeneous model 
in order to explain the data. 

A benchmark model. As a matter of comparison, we also consider a model 
which is commonly used to estimate and forecast volatility processes: the 
GARCH(1, 1) model proposed by Bollerslev (1986): 



af =uj + aRf_i + [3a 



2 



Among all parametric volatility models, it represents the most common spec- 
ification: "The GARCH(1, 1) is the leading generic model for almost all asset 
classes of returns. ... it is quite robust and does most of the work in almost 
all cases" [Engle (1995)]. 

We do not require the parameters to be constant throughout the whole 
sample, but, similarly to Franses and van Dijk (1996), we consider a rolling 
estimate. We thus fit the model to a sample of 350 observations, generate 
the forecast, delete the first observation from the sample and add the next 
one. Such a procedure reduces the harmful effect of possible parameter shifts 
on the forecasting performances of the model, even if at the same time it 
may increase the estimation variability. 

The volatility is a hidden process which can be observed only together 
with a multiplicative error; therefore, the evaluation of the forecasting per- 
formance of an algorithm is not straightforward. Due to the model (2.1), 
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Table 4 
Forecast performance of LAVE relative to GARCH 





7 = 


0.5 


7 = 


1.0 


7 = 


:2.0 


Currency 


M = 80 


M = 40 


M = 80 


M = 40 


M = 80 


M = 40 


AUD 


0.942 


0.945 


0.963 


0.962 


0.991 


0.982 


BPD 


0.961 


0.960 


0.979 


0.970 


1.006 


1.000 


CAD 


0.974 


0.979 


0.989 


0.992 


1.010 


0.997 


DKR 


0.978 


0.980 


0.985 


0.987 


1.010 


1.004 


JPY 


0.951 


0.949 


0.971 


0.966 


1.006 


0.997 


NKR 


0.961 


0.957 


0.972 


0.965 


0.998 


0.984 


NZD 


0.878 


0.879 


0.904 


0.902 


0.952 


0.947 


SFR 


0.985 


0.984 


0.992 


0.990 


1.004 


1.000 


SKR 


0.965 


0.961 


0.973 


0.968 


0.982 


0.977 



'Ei{R^j^i\T't) =o"^4-i- Therefore, given a forecast (Ti+i|t, the empirical mean 
value of |i?t_|_i — ar'^_^m\P can be used to measure the quality of this forecast. 
The forecast ability of the LAVE and the GARCH estimates is therefore 
evaluated with the following criterion: 



1 



T 



T-to 



t£ 



\R- 



t+i 



^t+i\t\ 



with p = 0.5. 



t=to 



The value of p = 0.5 is chosen instead of the more common p = 2 because 
we are interested in a robust criterion which is not too sensitive to the pres- 
ence of outliers. The relative performance of the LAVE and the GARCH 
estimates is displayed in Table 4. The performance of the LAVE approach 
is clearly better; furthermore, the table gives a clear hint for the choice of 
the power transformation. Indeed, 7 = 0.5 provides the smallest forecast- 
ing errors, while 7 = 2.0 leads to the largest forecasting errors, which are 
sometimes larger than that of the GARCH model. 

7. Conclusions and outlook. The locally adaptive volatility estimate (LAVE) 
is described and analyzed in this paper. It provides a nonparametric way for 
estimating and short-term forecasting the volatility of financial returns. 

It is assumed that a local constant approximation of the volatility process 
holds over some unknown interval. The issue of filtering this interval of time 
homogeneity out of the return time series is considered, and a nonparametric 
approach is presented. The estimate of the volatility process is then found 
by averaging over the interval of time homogeneity. 

A theoretical analysis of the properties of the LAVE algorithm is provided 
and the problem of selecting the smoothing parameters is analyzed through 
Monte Carlo simulation. The estimation results on change point models 
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show that the method has reasonable performance in practice. An empirical 
application to exchange rate returns and a comparison with a GARCH(1, 1) 
also provide good evidence that the new method is competitive and can even 
outperform the standard parametric models, especially for forecasting with 
a short horizon. 

An important feature of the proposed method is that it allows for a 
straightforward extension to multivariate volatility estimation; see Hardle, Herwartz and Spokoiny 
(2000) for a detailed discussion. 

Obviously, if the underlying conditional distribution is not normal, the es- 
timated volatility can give only partial information about the riskiness of the 
asset. Recent developments in risk analysis tend to focus on the estimation 
of the quantiles of the distribution. In this direction, the LAVE procedure 
can be used as a convenient tool for prewhitening the returns and obtaining 
a sample of "almost" identical and independently distributed returns, which 
do not display any more variance clustering. Therefore, the usual techniques 
of quantile estimation could be applied in a static framework. We regard 
such a development as a topic for future research. 

8. Proofs. In this section, we collect the proofs of the results stated 
above. We begin by considering some useful properties of the power trans- 
formation introduced in Section 2.1. 

Some properties of the power transformation. Let g-y{u) be the moment 
generating function of C-y = L'~^(|^|'>' — C-y): 

gy{u)=Be<-. 

It is easy to see that this function is finite for 7 < 2 and all u and for 7 = 2 
and u <1. For 7 = 1/2, the function 2u~'^loggy{u) is plotted in Figure 6. 

Lemma 8.1. For every 7 < 1 there exists a constant a-y > such that 
(8.1) logEe<^ < ^. 




Fig. 6. The log-Laplace transform of C,i/2 divided by the 
dard normal r.v. 



-Laplace transform of a stan- 
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Proof. It is easy to check that the function g-f{u) with 7 < 1 is positive 
and smooth (infinitely many times differentiable) . Moreover, the function 
/i-y(it) = log(7^(u) is also smooth and satisfies /i^(0) = h'^{Q) = 0, /i^(0) = 
EC^ = 1. This yields that u~'^h^{u) = u~'^ log g^{u) is bounded on every finite 
interval of the positive semiaxis [0,oo). It therefore remains to show that 

lim u~^logEe"''^ <oo. 

n— >oo 

Since C-f{u) = D~^{\C\'^ - C^), it suffices to bound n-^Ee'*'?!^/^^. For every 
t>0, 

< ^ut-<D-^ _^ 2Ee"^*^"'^^' 
Next, with t = u^'^'^'^' and 7 < 1, for u -^ cxd, 



n-2loge"*"^-"'=u-V2^ 



-1 



7 



u-^ log e"'*'^-'^" = u-^'-^y-'D-^ -. 0. 
For 7 = 1, the last expression remains bounded and the assertion follows. 

n 

For 7 = 1/2, condition (8.1) is satisfied with a^ = 1.005. 

The next technical statement is a direct consequence of Lemma 8.1. 

Lemma 8.2. Let ct be a predictable process w.r.t. the filtration T = {Tt); 
that is, every ct is a function of previous observations Ri, . . . ,Rt-i:ct = 
ct{Ri, ■ ■ ■ ,Rt-i)- Then the process 



St = exp ( ^ CsCs - Y Z! "^n 



is a supermartingale, that is, 

(8.2) Ei£t\J^t^i)<£t^i. 

The next result has been stated in Liptser and Spokoiny (2000) for Gaus- 
sian martingales; however, the proof is based only on the property (8.2) and 
allows for straightforward extension to sums of the form Mt = J2s=i CsCs- 
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Theorem 8.1. Let Mt = X]s=i ^sC with predictable coefficients Cg. Then 
let T he fixed or a stopping time. For every b> 0, B >1 and A > 1, 



P(|Mr| > \^{M)t, h < ^{M)t < bB) < 4VeA(l +logS)e~^'/(2a^), 
where 

t=i 

Remark 8.1. If the coefficients q are deterministic or independent 
of M, then Lemma 8.1 and the Chebyshev inequahty yield 



P{\Mt\>X\ {M)T)<2e 



-A2/(2a^) 



Proof of Theorem 3.1. Define 



1 






tei 



Then 9j = 9i + S,i- By the definition of A/, 



(8.3) 

Next, by (3.2) 






?r) 



<A, 



Ol — 6r = 9l — Or +il, 



and the use of (8.3) yields 

P(|6'/-0,|>A/ + Af/)<P| 



t&i 



> 



iS'')"} 



In addition, if the volatility coefficient at satisfies b < a^ < bB with some 
positive constants b,B, then the conditional variance vj = s^|/|~^X]te7^? 
satisfies 

b'\I\~^<vj<b'\I\~^B, 

with b' = bsli. Now the assertion follows from (3.5) and Theorem 8.1. D 

Proof of Theorem 3.2. It suffices to show that the inequalities A//i;/ < 
D and 

(8.4) |e/| = l^/-^/|<A^/ 

imply 1^7 — ^^1 < X'vi, where A' solves the equation D + \ = AY(1 + A's^|/|~^'^). 
This would yield the desired result by Theorem 8.1; compare the proof of 
Theorem 3.1. 
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Lemma 8.3. Let (A//t;/)s-y|/|~i/2 < ]_. Under (8A) 



VI > vj[^l - (AjM)24|/|-i - s^X\I\-'/^) > viil - s^\ir'^\Ai/vi + A)). 

Proof. By the definition of vj in view of (8.4), 

VI = s^ei\I\-^/^ > SjiOi - \vi)\I\-^/^. 
Since 9i is the arithmetic mean of 9t over /, 

Y.{et-dif<Y.{9t-er)'<Aj\i\. 



Next 



tei tei 



so that 



ei > s~^\i\^/\i.Ji - {Ais^vY^\i\~^/^y. 

Hence, under (8.4), 



VI > vi[^l - {Ais^vJM-^'^? - s-,m~^'^), 
and the assertion follows. D 

The bound (8.4) and the definition of A/ imply 

\ei - er\ < \ei - 0^1 + \ei - ei\ < A/ + Xvi <{d + x)vi. 

By Lemma 8.3, vi > vi{l - s^D\I\-'^/'^ - s^A|/|-i/2). Thus, 

as required. D 

Proof of Theorem 4.1. Let I be a "good" interval in the sense that, 
with high probability, Aj/vj < D for some nonnegative constant D and ev- 
ery J G v7(I). First we show that I will not be rejected with high probability 
provided that A is sufficiently large. 

We proceed similarly as in the proofs of Theorems 3.1 and 3.2. The pro- 
cedure involves the estimates 9j, Onj and the differences 9j — Onj for all 
/ € X(I) and all J £ -Jil)- The expansion 9j = 9j + ^j implies 

Oj - 0i\j = (Oj - 9i\j) + (o - 6\j). 
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Under the condition 6i < D, 

\Sj-^i\j\ 
Define the events 



< A/ < Dvi < Djvj + vj^j. 



^/= U \\0-^i\j\<{Xj-D)^v^j + v 



2 
I\J 



and 



'^^^^>l-s,XNj"' 



\-'j + 



i\J 



A, 



U ^i^ 

/el: /CI 



where A^j = min{|J|,|/\ J|} and Xj = X{1 - s^XNj^^^). 
Define Af = ^i n {5i <D}. On this set 



\ej 



^^J + ^'aj 



VI ^ |gj-g/\j| + lO-e/\j| 



^J + ^AJ 



<(D + Aj-L»), 



^'j + «A J ^ _D + A J - Z? 



^ ^J + ^\j 1 - s^Aiv;'/' 



A. 



It is easy to see that the conditional variance of ^j — ^/^j is equal to v^j + vj-.j. 
Arguing similarly to Lemma 8.3 and Theorem 3.1, we bound, with Xj^d = 



P{Aj)< Y. p{^-^>\j,d) 



J€J{I) 



P(^^>A,,z,Upf4^^M>A,,^ 



vi\j 



^j + ^/V 



< Y. 12VeAj(l + logS)e~^W(2'^^), 
JeJii) 

and the first assertion of the theorem follows. 

Now we show that on the set Aj the estimate 6 = 9j satisfies \6 — 9i\ < 
2Xvi. 

Due to the above, on A^ the interval I will not be rejected and, hence 
|/| > |I|. Let I be an arbitrary interval from I which is not rejected by the 
procedure. By construction I is one of the testing intervals for /. Denote 
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J = I\I. Note that |/|(^/ - 6*1) = | J|(^j - Oi), so that the event "/ is not 



rejected" imphes \6j — 6i\ < Awuj + v^ and 

\9i-ei\<^.Jij7dl<^{vj + vi). 

The use of vj = s^9j\J\~^''^ and \9i — 9i\ < X{vj + vi) yields 
implying 

1^/2 + AS^, _ , _ JJ|V2 + |]I|l/2 



'' - |J|V2_A./^' ^^ + ^^ ^ |J|V2_A./^- 



Therefore, 



It is straightforward to check that the function /(x) = x'^{x + l)/[(rE^ + 
l)(x — c)] with any c > satisfies /(x) < 2 for all x > 2c. This implies with 

x = |J|i/2/|]i|i/2 andc = As^/|I|i/2 that 

\fti-ei\<2\vi 

under the condition that | Jp^ > 2As^. 

Let Aj < Dvi. Similarly to Lemma 8.3, vi < fn(l + s^(D + A)|I|~^/^) and, 
by Theorem 3.1, l^i — 6-j-\ <{D + \)vi. This yields 

\ei - ei\ < 2At;i(l + s^{D + A)|I|-i/2) 

and 

1/2^ 



\ei - 0r\ < 2A?;i(l + s^{D + A)|ir^/") + {D + X)vi 



{D + 3X + 2\s^{D + X)\l\-^/^)vi 



as required. D 



Proof of Theorem 5.2. To simplify the exposition we suppose that 
ft = \. (This does not restrict generality since one can always normalize each 
"observation" Ij by ft.) We also suppose that 6' > \ and b = ft' — 1. (The case 
when ft' < 6 can be considered similarly.) Finally, we assume that m' = m. 
(One can easily see that this case is the most difficult one.) We again apply 
the decomposition 

ej = i + ^j, e, = ft' + ^r, 
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see the proof of Theorem 3.1. Hence 

e,-ej = b+Cj-ij. 

It is straightforward to see that v'j = si/m and Vj = siO' /m. By Lemma 8.1 
(see also Remark 8.1) 

and it suffices to check that the inequahties \^j\ < Xvj, \S^i\ < Xvj and (5.1) 
imply 



\Oj-ei\>x^vj + vf. 

Since 9' — 1 = b and since vj = Sy\J\~^''^6j and similarly for vj, we have 
under the conditions |Ol ^ Xvj, |^i| < Xvi, 



m 



vj 



VI- 



with p = m~^''^Xs-f. Therefore, 



(i+ej)<A-V(i+p), 
;i+6)<A^V(i+p), 



\0j - ei\ - X^v"] + vj > 6(1 -5)-2p- V2p{l + p) > 
in view of (5.1), and the assertion follows. D 
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